firefly neural architecture descent
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning. In particular, on a challenging continual image classification task, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
Review for NeurIPS paper: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Clarity: 1. clarify if \varepsilon and \delta are learnable parameters the same as model parameters or they are just learnable during the architecture descent. In Line 116, when optimizing \varepsilon and \delta, are neural network weights also updated? 2. "measured by the gradient magnitude", magnitude of full-batch or a few mini-batches? A small number? 5. Make use the legend labels "Random (split)" and "RandSearFh" in Figure 1(a) are exactly the same with those appeared in the text ("RandSearch (split)" and "RandSearch (split new)"). In Figure 1(a), a should-have simple baseline: add one neuron and randomly initialize new weights. In Figure 3(b), If the splitting and growing happen at the same time, the number of neurons (markers along x-axis) should have a gap larger than 1.
Review for NeurIPS paper: Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
Looking over the reviews, rebuttal and discussion afterwards, I think this paper is of interest to the community. However there are several ambiguities and issues in the presentation of the work that had been noted by the reviewers (e.g. R3) which does hamper how well the work can be understood. So I *urge* the authors to make sure all the points raised by the reviewers are considered (as mentioned in the rebuttal) and incorporated in the main text and make sure the text is clear and easy to parse. I do believe the rebuttal addresses most of the issues brought up, in particular I think the replies given to the issues raised by R1 are valid, including additional experiments that were required by the reviewer.
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks
We propose firefly neural architecture descent, a general framework for progressively and dynamically growing neural networks to jointly optimize the networks' parameters and architectures. Our method works in a steepest descent fashion, which iteratively finds the best network within a functional neighborhood of the original network that includes a diverse set of candidate network structures. By using Taylor approximation, the optimal network structure in the neighborhood can be found with a greedy selection procedure. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures that avoid catastrophic forgetting in continual learning. Empirically, firefly descent achieves promising results on both neural architecture search and continual learning.